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Abstract 

We present a closed dependent type theory whose inductive types 
are given not by a scheme for generative declarations, but by encod- 
ing in a universe. Each inductive datatype arises by interpreting its 
description — a first-class value in a datatype of descriptions. More- 
over, the latter itself has a description. Datatype-generic program- 
ming thus becomes ordinary programming. We show some of the 
resulting generic operations and deploy them in particular, useful 
ways on the datatype of datatype descriptions itself. Surprisingly 
this apparently self-supporting setup is achievable without paradox 
or infinite regress. 

1. Introduction 

Dependent datatypes, such as the ubiquitous vectors (lists indexed 
by length) express relative notions of data validity. They allow us to 
function in a complex world with a higher standard of basic hygiene 
than is practical with the context-free datatypes of ML-like lan- 
guages. Dependent type systems, as found in Agda [Norell 2007], 
Coq [The Coq Development Team 2009], Epigram [McBride and 
McKinna 2004], and contemporary Haskell [Peyton Jones et al. 
2006], are beginning to make themselves useful. As with rope, the 
engineering benefits of type indexing sometimes outweigh the dif- 
ficulties you can arrange with enough of it. 

The blessing of expressing just the right type for the job can 
also be a curse. Where once we might have had a small collection 
of basic datatypes and a large library, we now must cope with a 
cornucopia of finely confected structures, subtly designed, subtly 
different. The basic vector equipment is much like that for lists, 
but we implement it separately, often retyping the same code. The 
Agda standard library [Danielsson], for example, sports a writhing 
mass of list-like structures, including vectors, bounded-length lists, 
difference lists, reflexive-transitive closures — the list is petrifying. 
Here, we seek equipment to tame this gorgon's head with reflection. 

The business of belonging to a datatype is itself a notion rel- 
ative to the type's declaration. Most typed functional languages, 
including those with dependent types, feature a datatype declara- 
tion construct, external to and extending the language for defining 
values and programs. However, dependent type systems also allow 
us to reflect types as the image of a function from a set of 'codes' — 
a universe construction [Martin-Lof 1984]. Computing with codes, 



we expose operations on and relationships between the types they 
reflect. Here, we adopt the universe as our guiding design principle. 
We abolish the datatype declaration construct, by reflecting it as a 
datatype of datatype descriptions which, moreover, describes itself. 
This apparently self-supporting construction is a trick, of course, 
but we shall show the art of it. We contribute 

• a closed type theory, extensible only definitionally, nonetheless 
equipped with a universe of inductive families of datatypes; 

• a self-encoding of the universe codes as a datatype in the 
universe — datatype generic programming is just programming; 

• a bidirectional type propagation mechanism to conceal artefacts 
of the encoding, restoring a convenient presentation of data; 

• examples of generic operations and constructions over our uni- 
verse, notably the free monad construction; 

• datatype generic programming delivered directly, not via some 
isomorphic model or 'view' of declared types. 

We study two universes as a means to explore this novel way 
to equip a programming language with its datatypes. We warm up 
with a universe of simple datatypes, just sufficient to describe itself. 
Once we have learned this art, we scale up to indexed datatypes, en- 
compassing the inductive families [Dybjer 1991; Luo 1994] found 
in Coq and Epigram, and delivering experiments in generic pro- 
gramming with applications to the datatype of codes itself. 

We aim to deliver proof of concept, showing that a closed the- 
ory with a self-encoding universe of datatypes can be made practi- 
cable, but we are sure there are bigger and better universes waiting 
for a similar treatment. Benke, Dybjer and Jansson [Benke et al. 
2003] provide a useful survey of the possibilities, including exten- 
sion to inductive-recursive definition, whose closed-form presenta- 
tion [Dybjer and Setzer 1999, 2000] is both an inspiration for the 
present enterprise, and a direction for future study. 

The work of Morris, Altenkirch and Ghani [Morris 2007; Mor- 
ris and Altenkirch 2009; Morris et al. 2009] on (indexed) containers 
has informed our style of encoding and the equipment we choose 
to develop, but the details here reflect pragmatic concerns about in- 
tensional properties which demand care in practice. We have thus 
been able to implement our work as the basis for datatypes in the 
Epigram 2 prototype [Brady et al. 2009]. We have also developed a 
stratified model of our coding scheme in Agda 1 . 



2. The Type Theory 

One challenge in writing this paper is to extricate our account of 
datatypes from what else is new in Epigram 2. In fact, we demand 
relatively little from the setup, so we shall start with a 'vanilla' 
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theory and add just what we need. The reader accustomed to de- 
pendent types will recognise the basis of her favourite system; for 
those less familiar, we try to keep the presentation self-contained. 

2.1 Base theory 

We adopt a traditional presentation for our type theory, with three 
mutually defined systems of judgments: context validity, typing, 
and equality, with the following forms: 

F h VALID r is a valid context, giving types to variables 

Y\- t:T term t has type T in context T 

r F s = t : T s and t are equal at type T in context Y 

The rules are formulated to ensure that the following 'sanity 
checks' hold by induction on derivations 



r h s. 



t:T - 



T F VALID A T h T : 
r h s:T A rh t:T 



Set 



and that judgments J are preserved by well-typed instantiation. 

T;x:S;A\-J T\-s:S F; A[s/x] F J[s/x] 

We specify equality as a judgment, leaving open the details of 
its implementation, requiring only a congruence including ordinary 
computation (/?-rules), decided, e.g., by testing a-equivalence of 
/3-normal forms [Adams 2006]. Coquand and Abel feature promi- 
nently in a literature of richer equalities, involving ^-expansion, 
proof-irrelevance and other attractions [Abel et al. 2009; Coquand 
1996]. Agda and Epigram 2 support such features, Coq currently 
does not, but they are surplus to requirements here. 

Context validity ensures that variables inhabit well-formed sets. 



h VALID 



r F S : Set 
r; x : S F VALID 



x r 



The basic typing rules for tuples and functions are also standard, 
save that we locally adopt SET : SET, putting presentation be- 
fore paradox [Girard 1972]. The usual remedies apply, stratifying 
SET [Courant 2002; Harper and Pollack 1991; Luo 1994]. 



T;x:S; A F VALID 
F;x:S;A\-x:S 

T F VALID 

r F Set : Set 



rhs:S rh S= T : Set 



Th s:T 



F F VALID 
T F 1:SET 



T F VALID 
T F [1:1 



r F S : SET F;x:S F T : SET 
r F (x:S) x T : SET 

fhs:5 F;x:S h T:SET T\-t:T[s/x] 
rh [s,t] x T :(x:S) x T 

r F p:(x:S) X T 
r F tto p : S 



T F p:(x:S) x T 
F F 7Ti p : T[ttq pj x] 



r F S : Set F-x-.S F T : Set 

r F (x : S) ->• T : SET 



T F 5 : Set 
r;i:5hi:T 
r F Aga;. i : (x : S) — > T 



rF/:(x:S)^T 

r F s:S 

Fhfs:T{s/x] 



Notation. We subscript information needed for type synthesis but 
not type checking, e.g., the domain of a A-abstraction, and suppress 
it informally where clear. Square brackets denote tuples, with a 
LISP-like right-nesting convention: [a b] abbreviates [a, [b, []]] . 

The judgmental equality comprises the computational rules be- 
low, closed under reflexivity, symmetry, transitivity and structural 
congruence, even under binders. We omit the mundane rules which 



ensure these closure properties for reasons of space. 

TF5:SET r;x:5F£:r 

r F s:S 

T F (\ s x. i) s = t[s/x] : T[s/x] 



r F s:S T;x:S\- T : SET 
T;s:S\- t:T[s/x] 



T F s:S F;x:S\- T:SET 

T;s:S F t: T\s/x] 

T F ¥ 0 (Is, t] x T ) = s:S r F 7n ([s, t] x T ) = t : T[s/x\ 

Given a suitable stratification of SET, the computation rules yield 
a terminating evaluation procedure, ensuring the decidability of 
equality and thence type checking. 

2.2 Finite enumerations of tags 

It is time for our first example of a universe. You might want to offer 
a choice of named constructors in your datatypes, we shall equip 
you with sets of tags to choose from. Our plan is to implement (by 
extending the theory, or by encoding) the signature 

En : SET #(£:En):SET 

where some value E : En in the 'enumeration universe' describes a 
type of tag choices #E. We shall need some tags — valid identifiers, 
marked to indicate that they are data, not variables scoped and 
substitutable — so we hardwire these rules: 



T F VALID 



T F VALID 



s a valid identifier 



r F Tag : SET TF's:Tag 

Let us describe enumerations as lists of tags, with signature: 

nE:En cE (t : Tag) (E : En) : En 

What are the values in Formally, we represent the choice of a 
tag as a numerical index into E, via new rules: 



T F VALID 



F F n:#E 



TFO:#(cE££0 F F 1+n : #(cE t E) 

However, we expect that in practice, you might rather refer to these 
values by tag, and we shall ensure that this is possible in due course. 

Enumerations come with further machinery. Each #E needs 
an eliminator, allowing us to branch according to a tag choice. For- 
mally, whenever we need such new computational facilities, we add 
primitive operators to the type theory and extend the judgmental 
equality with their computational behavior. However, for compact- 
ness and readability, we shall write these operators as functional 
programs (much as we model them in Agda). 

We first define the 'small product' n operator: 

7T : (£ , :En)(P:#B->SET)^SET 
TrnE Pi-> 1 

7T (cE t E) Ph4 POxn E (Ax. P (1+x)) 

This builds a right-nested tuple type, demanding an object of P i for 
each i in the given finite domain. We can see these tuples as 'jump 
tables' tabulating dependently typed functions from the domain. 
We give this functional interpretation — the eliminator we need — 
by the switch operator, which, unsurprisingly, iterates projection: 

switch : (E:En)(P:#E->SET)^ir E P->(x:#E)-^ P x 
switch (cE t E) P b 0 M> 7T 0 b 

switch (cE t E) P b (1+x) i-> switch E (Ax. P(l+x)) (ttj b) x 

The 7r and switch operators deliver dependent elimination for 
finite enumerations, but are rather awkward to use directly. We do 
not write the range for a A-abstraction, so it is galling to supply 
P for functions defined by switch. Let us therefore find a way to 
recover the tedious details of the encoding from types. 
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r lh exprEx t> term G type 



r lh type 9 exprln > term 



r lh Set 9 T > T' T lh T" 9 t > i' 



r ih (i:T) t> ? e r 

T; x : S; A h VALID 



r lh / >/'S (x:S)^T 

r ih 5 9 s > s' 

r;2;:S;Alhi>ie5 r Ih / s > /' s' G T[s'/x] 



T lh p > p' G X T 

F lh 7TQ p > 7T() p' G 5* 



r Ihff > p' 6 (x:S) X T 

r Ih 7Tl p > 7Tl p' G TfTTo p' / X] 



Figure 1. Type synthesis 



2.3 



Type propagation 

Our approach to tidying the coding cruft is deeply rooted in 
the bidirectional presentation of type checking from Pierce and 
Turner [Pierce and Turner 1998]. They divide type inference into 
two communicating components. In type synthesis, types are pulled 
out of terms. A typical example is a variable in the context: 

f;i:5;Ah VALID 
f;i:5;Ah x:S 

Because the context stores the type of the variable, we can extract 
the type whenever the variable is used. 

On the other hand, in the type checking phase, types are pushed 
into terms. We are handed a type together with a term, our task 
consists of checking that the type admits the term. In doing so, we 
can and should use the information provided by the type. Therefore, 
we can relax our requirements on the term. Consider A-abstraction: 

ThgiSET F-x:S\-t:T 
r\-X s x.t:(x:S)^ T 

The official rules require an annotation specifying the domain. 
However, in type checking, the LI-type we push in determines the 
domain, so we can drop the annotation. 

We adapt this idea, yielding a type propagation system, whose 
purpose is to elaborate compact expressions into the terms of 
our underlying type theory, much as in the definition of Epi- 
gram 1 [McBride and McKinna 2004], We divide expressions 
into two syntactic categories: exprln into which types are pushed, 
and exprEx from which types are extracted. In the bidirectional 
spirit, the exprln are subject to type checking, while the exprEx — 
variables and elimination forms — admit type synthesis. We embed 
exprEx into exprln, demanding that the synthesised type coincides 
with the type proposed. The other direction — only necessary to 
apply abstractions or project from pairs — takes a type annotation. 

Type synthesis (Fig. 1) is the source of types. It follows the 
exprEx syntax, delivering both the elaborated term and its type. 
Terms and expressions never mix: e.g., for application, we instan- 
tiate the range with the term delivered by checking the argument 
expression. Hardwired operators are checked as variables. 

Dually, type checking judgments (Fig. 2) are sinks for types. 
From an exprln and a type pushed into it, they elaborate a low- 
level term, extracting information from the type. Note that we 
inductively ensure the following 'sanity checks': 

rihe>tGT=>rhf:T 
ril-T3e>t=>rht:T 

Canonical set-formers are checked: we could exploit SET : 
SET to give them synthesis rules, but this would prejudice our 
future stratification plans. Note that abstraction and pairing are 



T lh s > s' G S r lh Set 9 S = T 

r ih t 9 s > 7 

r h VALID 

r lh Set 9 Set > Set 

r Ih Set 9 ff t> 5" T-x-.S' lh Set 9 T > T 

T Ih SET 9 (x : S) ->• T > (x : S') -)• T' 

F;x:S Ih T 9 t > t' 
T Ih (x : S) ->• T 9 Ax. t > A s x. t' 

r Ih Set 9 ff i> 5' T-x-.S' Ih Set 9 T > T' 
T Ih SET 9 (x:S) x T > (x:S') x T' 

r ih s 9 s > s' r ih t[s'/x] 9 t > t' 

r lh (x:S) x T 9 [s,t] > [s',t'] x T 

r lh (g:g)->(y:r)-> U[[x,y] xT /p] 3f > /' 
r Ih (p:(x:S)x r)-> U3 A/ 0 A (( , 5) ~ T) p.f (ttq p) (tti p) 

T h VALID T h VALID 

r lh Set 9 1 > 1 r Ih 1 9 n > D 

r h VALID r lh En 9 e > E' 

r lh En 9 ['t,E] > cE'tE' 

r lh 9 't > n 't_£ 'to 
T lh #(cE 't 0 E) 9 't > l+n 

r lh #E 9 n > n' 
T lh #(cE 't 0 E) 9 l+n > l+n' 



T Ih En 9 [] > nE 

T h E:En 
T lh #(cE'tE) 9 't > 0 

T h E:En 



T lh #(cE 't E) 9 0 > 0 

T lh 7T E (X #E x. T) 9 [t\ > t' 
T Ih (x : #¥) -> T 9 [i] > switch E (A #E x. T) t' 

Figure 2. Type checking 



free of annotation, as promised. Most of the propagation rules 
are unremarkably structural: we have omitted some mundane rules 
which just follow the pattern, e.g., for Tag. 

However, we also add abbreviations. We write A/, pronounced 
'uncurry /' for the function which takes a pair and feeds it to / one 
component at a time, letting us name them individually. Now, for 
the finite enumerations, we go to work. 

Firstly, we present the codes for enumerations as right-nested 
tuples which, by our LISP convention, we write as unpunctuated 
lists of tags ['to • ■ • 't„\. Secondly, we can denote an element by 
its name: the type pushed in allows us to recover the numerical 
index. We retain the numerical forms to facilitate generic opera- 
tions and ensure that shadowing is punished fittingly, not fatally. 
Finally, we express functions from enumerations as tuples. Any 
tuple-form, [] or [_, _] , is accepted by the function space — the gener- 
alised product — if it is accepted by the small product. Propagation 
fills in the appeal to switch, copying the range information. 

Our interactive development tools also perform the reverse 
transformation for intelligible output. The encoding of any spe- 
cific enumeration is thus hidden by these translations. Only, and 
rightly, in enumeration-generic programs is the encoding exposed. 

Our type propagation mechanism does no constraint solving, 
just copying, so it just the thin end of the elaboration wedge. 
It can afford us this 'assembly language' level of civilisation as 
En universe specifies not only the representation of the low-level 
values in each set as bounded numbers, but also the presentation 
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of these values as high-level tags. To encode only the former, 
we should merely need the size of enumerations, but we extract 
more work from these types by making them more informative. We 
have also, en passant, distinguished enumerations which have the 
same cardinality but describe distinct notions: #['red 'blue] is not 
#['green 'orange]. 



3. A Universe of Inductive Datatypes 

In this section, we describe an implementation of inductive types, 
as we know them in ML-like languages. By working with fa- 
miliar datatypes, we hope to focus on the delivery mechanism, 
warming up gently to the indexed datatypes we really want. Dy- 
bjer and Setzer's closed formulation of induction-recursion [Dyb- 
jer and Setzer 1999], but without the '-recursion'. An impredicative 
Church-style encoding of datatypes is not adequate for dependently 
typed programming, as although such encodings present data as 
non-dependent eliminators, they do not support dependent induc- 
tion [Geuvers 2001]. Whilst the A-calculus captures all that data 
can do, it cannot ultimately delimit all that data can be. 

3.1 The power of E 

In dependently typed languages, E-types can be interpreted as two 
different generalisations. This duality is reflected in the notation 
we can find in the literature. The notation Y, x -.a(B x) stresses that 
E-types are 'dependent sums', generalising of sums over arbitrary 
arities, where simply typed languages have finite sums. 

On the other hand, our choice of notation (x : A) x(Bx) empha- 
sises that E-types generalise products, with the type of the second 
component depending on the value of the first, where simply typed 
languages do not express such relative validity. 

In ML-like languages, datatypes are presented as a sum-of- 
products. A datatype is defined by a finite sum of constructors, each 
carrying a product of arguments. To embrace these datatypes, we 
have to capture this grammar. With dependent types, the notion of 
sum-of-products translates into sigmas-of-sigmas. 

3.2 The universe of descriptions 

While sigmas-of-sigmas can give a semantics for the sum-of- 
products structure in each node of the tree-like values in a datatype, 
we need to account somehow for the recursive structure which ties 
these nodes together. Not for the first time, we do this by con- 
structing a universe [Martin-L6f 1984]. Universes are ubiquitous 
in dependently typed programming [Benke et al. 2003; Oury and 
Swierstra 2008], but here we seek to exploit them as the foundation 
of our notion of datatypes. 

To add inductive types to our type theory, we build a universe 
of datatype descriptions by implementing the signature presented 
in Figure 3, with codes mimicking the grammar of datatype decla- 
rations. We can read a description D : Desc as a 'pattern functor' 
on SET, with \D\ its action on an object, X, soon to be instantiated 
recursively. 

Descriptions are sequential structures, terminated by '1, indi- 
cating the empty tuple. To build sigmas-of-sigmas, we define a 'E 
code, interpreted as a E-type. To request a recursive component, 
we have 'indx D, where D describes the rest of the node. 

You may have noticed that we are a little coy about this pre- 
sentation, writing of 'implementing a signature' without clarifying 
how. A viable approach would simply be to extend the theory with 
constants for the constructors and an operator for \D\ . However, in 
Section 4, you will see what we actually do. In the meantime, let us 
first gain some intuition for its use by developing some examples. 



Desc : SET 
' 1 : Desc 

'E {S : SET) (D:S^> Desc) : Desc 
'indx (Z):Desc) : Desc 

[_] : Desc -> SET -> SET 

I'll X K>1 

l"ESDJ X ^(s-.S)xlDsjX 
I'indx D\X h-> X x ID} X 



Figure 3. Universe of Descriptions 
3.3 Examples 

We begin with the natural numbers, now working in the high-level 
expression language of Section 2.3, exploiting type propagation. 

NatD : Desc 

NatD i-> 'E #['zero 'sue] ['1 ('indx '1)] 

Let us explain its construction. First, we use 'E to give a choice 
between the 'zero and 'sue constructors. What follows depends 
on this choice, so we write the function computing the rest of the 
description in tuple notation. In the 'zero case, we reach the end of 
the description. In the 'sue case, we attach one recursive argument 
and close the description. Translating the E to a binary sum, we 
have effectively described the functor: 

NatD Z ^ 1 + Z 

Correspondingly, we can see the injections to the sum: 

['zero] : [NatD] Z ['sue (z : Z)] : [NatD] Z 

With a small change to this definition, we obtain the pattern 
functor for lists: 

ListD : Set-h» Desc 

ListD Ih>'S #['nil 'cons] ['1 ('E X A_. 'indx '1)] 

The 'sue constructor is turned into a proper 'cons, taking an argu- 
ment in X followed by a recursive argument. This code describes 
the following functor: 

ListD X Z i-> 1 + XxZ 

Finally, we are not limited to one recursive argument. This is 
demonstrated by our description of node-labelled binary trees: 

TreeD : Set — > Desc 

TreeD X i-> 'E #[' leaf node] 

['1 ('indx ('SJfA_ 'indx '1))] 

Again, we are one evolutionary step away from ListD. However, 
instead of a single call to the induction code, we add another. The 
interpretation of this code corresponds to the following functor: 

TreeD XZ^l + ZxXxZ 

From the examples above, we observe that datatypes are defined 
by a 'E whose first argument enumerates the constructors. We call 
codes fitting this pattern tagged descriptions. Again, this is a clear 
reminder of the sum-of-products style. Every description can be 
forced into this style with a singleton constructor if necessary. We 
characterise tagged descriptions thus: 

TagDesc : SET 

TagDesc M> (E : En) x (tt E (A- Desc)) 
de : TagDesc — > Desc 

de i ^ AA£. AD. 'E #E (switch E (A_. Desc) D) 

It is not a great stretch to imagine that the traditional datatype 
declaration syntax might desugar to the definition of a datatype via 
a tagged description. 
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3.4 The least fixpoint 

So far, we have built pattern functors with our Desc universe. Being 
polynomial functors, they all admit a least fixpoint, which we now 
construct by tying the knot: the element type abstracted by the 
functor is now instantiated recursively: 

fh J>:Desc T\- D: Desc Y \- d:jD\ {fiD) 



Y h fiD : Set 



r h con d : /iD 



We can now build datatypes and their elements, e.g.: 

Nat i — ^ /it(de [['zero 'sue], ['1 ('indx '1)]]) : SET 
con ['zero] : Nat con ['sue (n:Nat)] : Nat 

But how shall we compute with our data? We should expect an 
elimination principle. Following a categorical intuition, we might 
provide the 'fold', or 'iterator', or 'catamorphism' : 

cata : (D : Desc)( T : Set) -^([D] T -» T) -> fj,D -> T 

However, iteration is inadequate for dependent computation. We 
need induction to write functions whose type depends on inductive 
data. Following Benke et al. [2003], we adopt the following: 

ind : (D:Desc)(P:/i_D^SET)^ 

{{d : [D] (p,D)) -> All D (nD) P d^ P(con d)) -> 

(x-.fiD) -¥ Px 
ind D P m (con d) = m d (all D (jiD) P (ind D P m) d) 

Here, All D X P d states that P : X — > SET holds for every 
subobject x : X in D, and all D X P p d is a 'dependent map', 
applying some p : (x : X)^Px to each x contained in d. 
The full definition (including an extra case, introduced shortly) is 
presented in Figure 4. Note that ind is our first generic operation 
over descriptions, albeit a hardwired operator. Any datatype we 
define automatically comes with an induction principle. 

We note that the very same functors \D\ also admit greatest 
fixpoints, and we have indeed implemented coinductive types this 
way, but that is a story for another time. 

3.5 Extending type propagation 

We have now enough machinery to build and manipulate inductive 
types at a low level. Let us now apply cosmetic surgery to the 
syntactic overhead. We extend type checking of expressions: 

r II- #£ 3 'c> n Y Ih \D n] (/*('£ #£ D)) B [t\ > 
r Ih JtjjE #E D) B'ct > con [n, V] 

Here 'c t denotes a tag 'applied' to a sequence of arguments, and 
[t] that sequence's repackaging as a right-nested tuple. Now we 
can just write data directly. 

'zero : Nat 'sue (n : Nat) : Nat 

Once again, the type explains the legible presentation, as well as 
the low-level representation. 

We may also simplify appeals to induction by type propagation, 
as we have done with functions from pairs and enumerations. 

T Ih (d: ID} (pD)) -> All D (jiD) (A mD x. P) d -> P[con d/x] 

Bf > f 

r Ih (x TfjkD) ->• P B 0/ > ind D (\ D x. P) f 

This abbreviation is no substitute for the dependent pattern match- 
ing to which we are entitled in a high-level language built on top of 
this theory [Goguen et al. 2006], but it does at least make 'assembly 
language' programming mercifully brief, if hieroglyphic. 

plus : Nat-» Nat-> Nat 

plus h-> C3A[(A_ A_. \y, y) (A_. Kkh. A_. Ay. 'sue (h y))] 



This concludes our introduction to the universe of datatype 
descriptions. We have encoded sum-of-products datatypes from the 
simply-typed world as data and equipped them with computation. 
We have also made sure to hide the details by type propagation. 

4. Levitating the Universe of Descriptions 

In this section, we will fulfil our promises and show how we im- 
plement the signatures, first for the enumerations, and then for the 
codes of the Desc universe. Persuading this to perform was a per- 
ilous pedagogical peregrination for the protagonist. Our method 
was indeed to hardwire constants implementing the signatures 
specified above, in the first instance, but then attempt to replace 
them, step by step, with definitions: "Is 2 + 2 still 4?", "No, it's 
a loop!". But we did find a way, so now we hope to convey to the 
reader the dizzy feeling of levitation, without the falling. 

4.1 Implementing finite enumerations 

In Section 2.2, we specified the finite sets of tags. We are going to 
implement the En type former and its constructors. Recall: 

En : SET nE:En cE (t : Tag) (E : En) : En 

The nE and cE constructors are just the 'nil' and 'cons' or ordinary 
lists, with elements from Tag. Therefore, we implement: 

En n> /i(ListD Tag) nE 'nil cE(Ei-> 'cons t E 

Let us consider the consequences. We discover that the type the- 
ory does not need to be extended with a special type former En, or 
special constructors nE and cE. Moreover, the nEP operator, com- 
puting tuple types of Ps by recursion on E need not be hardwired: 
we can just use the generic ind operator, as we would for any ordi- 
nary program. 

Note, however, that the universe decoder #E is hardwired, as 
are the primitive 0 and 1+ that we use for low-level values, and 
indeed the switch operator. We cannot dispose of data altogether! 
We have, however, gained the ordinariness of the enumeration 
codes, and hence of generic programs which manipulate them. Our 
next step is similar: we are going to condense the entire naming 
scheme of datatypes into itself. 

4.2 Implementing descriptions 

We shall now fulfil our implementation promises, encoding the 
universe of descriptions. In and of itself, the codes, Desc, is nothing 
but a datatype. We are in the same situation as with En: we ought 
to be able to describe the codes of Desc in Desc itself. Hence, this 
code would be a first-class citizen, born with the standard, generic 
equipment of datatypes. 

4.2.1 First attempt 

Our first attempt gets stuck quite quickly: 
DescD : Desc 

T'l 1 T'l 
DescD i->de '£ , '£ Set (AS. {?} ) 
'indx 'indx '1 

Let us explain where we stand. Much as we have done so far, 
we first offer a constructor choice from '1, and 'indx. The 
reader will notice that the 'tagged' notation we have used for the 
Desc constructors now fully makes sense: these were actually the 
tags we are defining. For '1, we immediately reach the end of 
the description. For 'indx, there is a single recursive argument. 
Describing 'S is problematic. Recall the specification of 'E: 

'£ (S : Set) (D :S -> Desc) : Desc 

So, we first pack a SET, S. We should then like a recursive argu- 
ment indexed by S, but that is an exponential, and our presentation 
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= 1 



All: (Z) : Desc) (X: Set) (P:X 

(xs:lD] X)^SBT 
All '1 XP[] 
All ('£ S D) XP[s,d] 
All ('indx L>) XP[x,d] 
All ('hindx H D) X P [/, d] 



Set) 



All (D s)X P d 
P x x All D X P d 
((h-.H)-^P(f h)) x All D X P d 



all: (D:Desc)(X :SEl){P:X ^ Set) 

(p: (x :X) -»• P s)(xs : [D] X) -> All D X P xs 
all '1 X Pp [] =[] 

all ('£ 5 D) X P p [s, d] = all {D s) X P p d 
all ('indx D) X P p [x, d] = [px,a\\DX Pp d] 
all ('hindx H D) X P p [f, d] = [AA.p (/ ft), all D X P p d] 



Figure 4. Defining and collecting inductive hypotheses 



is entirely first-order so far, delivering only sums-of-products. To 
code our universe, we must first enlarge it! 

4.2.2 Second attempt 

In order to capture a notion of higher-order induction, we add a 
code 'hindx that takes an indexing set H. Intuitively, 'hindx gives 
a recursive subobject for each element of H. 

'hindx (H : Set) (D : Desc) : SET 
['hindx H D\ X >->■ (H X) x [D] X 

Note that up to isomorphism, 'indx is subsumed by 'hindx 1. 
However, the apparent duplication has some value. Unlike its coun- 
terpart, 'indx is first-order: we prefer not to demand dummy func- 
tions from 1 in ordinary data, e.g. 'suc(A_. n). It is naive to imagine 
that up to isomorphism, any representation of data will do. First- 
order representations are finitary by construction, and thus admit 
a richer, componentwise decidable equality than functions may in 
general possess. 2 

We are now able to describe our universe of datatypes: 

DescD : Desc 

'1 



DescD i y de 





"'1 






'£ 






'indx 






_'hindx_ 





'£ Set XS. 
'indx '1 
'£SETA_. 



'hindx S '1 



indx '1 



The '1 and 'indx cases remain unchanged, as expected. We suc- 
cessfully describe the '£ case, by a simple appeal to the higher- 
order induction on S. The 'hindx case consists in packing a SET 
with a recursive argument. 

At a first glance, we have achieved our goal. We have described 
the codes of the universe of descriptions. Taking the fixpoint of 
[DescD] gives us a datatype exactly like Desc. Might we be so 
bold as to take Desc h-> /iDescD as the levitating definition? If we 
do, we shall come down with a bump! To complete our levitation, 
just as in the magic trick, requires hidden assistance. Let us explain 
the problem and reveal the 'invisible cable' which fixes it. 

4.2.3 Final move 

The definition Desc /iDescD is circular, but the offensive 
recursion is concealed by a prestidigitation. Expanding be de — 
and propagating types as in Figure 2 reveals the awful truth: 

Desc i-¥ /i('£ #['1 'indx 'hindx] 

switch ['1 '£ 'indx 'hindx] (A_. Desc) 
"'1 

'£ Set XS. 'hindx S"l 
'indx '1 

'£ Set A_. 'indx '1 



) 



The recursion shows up only because we must specify the return 
type of the general-purpose switch, and it is computing a Desc! Al- 
though type propagation allows us to hide this detail when defining 

2 E.g., extensionally, there is one inhabitant of #[] — > Nat; intensionally, 
there is a countable infinitude which it is not safe to collapse. 



a function, we cannot readily suppress this information and check 
types when switch is fully applied. 

We are too close to give up now. If only we did not need to 
supply that return type, especially when we know what it must be. 
We eliminate the recursion by specialising switch: 

switchD : (E : En) ^(tt E A_. Desc) -> #E ->■ Desc 

The magician's art rests here, in this extension. We conceal it 
behind a type propagation rule for switchD which we apply with 
higher priority than for switch in general. 

r lh ty E (X #E x. Desc) 3 [t\ > t' 
T IF ->• Desc 9 [t\ > switchD E t' 

As a consequence, our definition above now propagates without in- 
troducing recursion. Of course, by pasting together the declaration 
of Desc and its internal copy, we have made it appear in its own 
type. Hardwired as a fait accompli, this creates no regress, although 
one must assume the definition to recheck it. 

We have levitated Desc. Beyond its pedagogical value, this 
exercise has several practical outcomes. First of all, it reveals that 
the Desc universe is just plain data. As any piece of data, it can 
therefore be inspected and manipulated. Moreover, it is expressed 
in the Desc universe. As a consequence, it is equipped, for free, 
with an induction principle. So, our ability to inspect and program 
with Desc is not restricted to a meta-language: we now have all 
the necessary equipment in the theory to program over datatypes. 
Generic programming is just programming. 

4.3 The generic catamorphism 

In Section 3.4, we hardwired a dependent indunction principle, 
instead of the catamorphism. However, in some circumstances, the 
full power of a dependent elimination is not necessary. Let us now 
derive the catamorphism from ind principle. 

The catamorphism is defined by induction on the description 
D, with a readily propagated non-dependent return type T. Given 
a node xs and the induction hypotheses, the method ought to build 
an element of T. Provided that we know how to make an element 
of [£)] T, this step will be performed by the algebra /. Let us take 
a look at this jigsaw: 

cata : (D : Desc)( T : Set) -+(Wi T^T)^^lD^T 
cata D Tf i-> OXxs.Xhs.f {?} 

We are left with filling the hole. Recall that we have xs : [-D] \iD 
and hs : All D (fiD) (A_. T) xs at hand. Our goal is to make 
an element of [/)] T. Intuitively, xs is of the right shape, but its 
sub-elements are of the wrong type. On the other hand, for each 
sub-element of xs, hs gives us the corresponding element in T. 
Therefore, to construct an element of [D] T, we must replace 
the recursive components of xs by their counterparts from hs. Let 
us write a program to do that — please forgive us if we lapse to a 
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pattern matching notation, for readability's sake. 

replace : (D : Desc)(X , Y:Set) 

{xs : (D} X) ->• All D X (A_. Y) xs -» [£>] Y 
replace 'IX Y [] [] i-> [] 

replace ('E SD)I 7 [s, rf] d' h-> [s, replace (Z> s) X Y d d'] 
replace ('indx D) X Y [x, d] [y, d ] >-> 

[y, replace D X Y d d'\ 
replace ('hindx H D) X Y [/, d'\ [g, d'] ^ 

[g, replace D X Y d d'\ 

Filling the hole in cata with replace D (fJ,D) T xs hs closes the 
problem. In the type theory, we have built a generic catamorphism. 
Any datatype will now come equipped with this operation, for free. 

With this example, we have shown how we can derive a generic 
operation, the catamorphism, from a pre-existing generic operation, 
the induction principle. This has been made possible by our ability 
to manipulate descriptions as first-class objects: the catamorphism 
is, basically, a function mapping a Desc to a datatype specific 
operation. This is a form of polytypic programming, as we learned 
from PolyP [Jansson and Jeuring 1997]. 

4.4 The generic free monad 

In this section, we will turn to a more ambitious generic operation 
on datatype. Given a functor, represented as a tagged description, 
we build the free monad over this functor. 

Let us recall the free monad construction. Given a functor F, 
the free monad over F is defined by the following datatype: 

data FreeMonad (F : SET -s> SET)(Jt : Set) : SET where 
Var : X -> FreeMonad F X 

Composite : /"(FreeMonad FI)-> FreeMonad F X 

Being an inductive type, this FreeMonad datatype is itself defined 
by a pattern functor. It is given by: 

FreeMonadD FXZ^X + FZ 

In our setting, the free monad construction will take the functor 
as a tagged description, a set X of variables, and will compute the 
tagged description of the corresponding free monad. Implementing 
this function is surprisingly easy: 

_* : TagDesc— > SET— > TagDesc 
[E,D]* X^ [[\ar,E],['TlX'l,D]] 

We simply add a constructor, 'var, and define its argument to be 
a '£ X '1, that is an element of X. We keep E and D as they 
were, hence leaving the other constructors unchanged. Unfolding 
the interpretation of this definition, we convince ourselves that this 
corresponds to the functor FreeMonadD. The fixpoint operation 
ties the knot and gives us the full-blown free monad construction. 

Of course, we must equip the resulting datatypes with opera- 
tions delivering a monadic interface. As expected, \x. 'var x plays 
the role of return, embedding variables into terms. The bind oper- 
ation corresponds to substitution. We will now implement it, as a 
generic function. 

Our implementation will appeal to the cata function developed 
previously. So, let us write down the types, and fill as much argu- 
ments to cata as possible: 

subst : (D: TagDesc) {X, Y : Set) ->(X -> n(de (D* Y))) -> 

fi(de(D*X))-*n(de(D*Y)) 
subst D X Y a i-> cata (de (D*X)) (n(de (D* Y))) {?} 

We are left with implementing the algebra of the catamorphism. 
Intuitively, its role is to catch appearances of 'var x and replace 



them by a x. This corresponds to the following definition: 

apply : (D : TagDesc) (X, Y : Set) ->(X -> ^(de D* X)) -> 

[de D*X} /tt(de D* Y) -> fj,(de D* Y) 
apply D X Y a ['var, x] i-> a x 
apply D X Y a [c, xs] i-> con [c, xs] 

Filling the sub-goal with apply D X Y a completes the im- 
plementation. To sum up, we have implemented the free monad 
construction for an arbitrary tagged description. This gives the de- 
veloper the ability, for any datatype, to extend it with a notion 
of variable. Then, we have equipped this structure with the corre- 
sponding monadic operation, bind and return. This construction is 
an example of type-indexed datatype [Hinze et al. 2002], as found 
in Generic Haskell: from a datatype, we build a new datatype and 
equip it with its structure. 

5. A Universe of Inductive Families 

So far, we have explored the well-known realm of inductive types. 
We have built upon our intuition of ML-like datatypes. In our 
dependent setting, we have provided these datatypes by the mean 
of Desc, a universe of descriptions. 

Working with dependent types fosters new opportunities for 
datatypes. The typical example is bounded lists, also known as 
vectors. A vector is a list decorated by its length. Having this 
information prevents hazardous operations, such as taking the head 
of an empty vector: the head function only takes vectors of length 
'sue n, as enforced by its type. This is made possible by the 
specificity of dependent types: a term - the length - can influence 
a type - the vector type. 

However, these datatypes cannot be defined by mere induction. 
In the case of vectors, for instance, we have to define the whole 
family of vectors in one go: vectors of all sizes need to be defined at 
the same time. In dependently-typed languages, the basic grammar 
of datatypes is that of inductive families. To capture this grammar, 
we rely on indexing. 

5.1 The universe of indexed descriptions 

In the previous section, we have presented the Desc universe as a 
grammar of functors in the category SET. We have seen how to 
code inductive types in this setting. To describe an inductive family 
indexed by / : SET, we use endofunctors on the category SET 1 . 
We call these indexed functors. I — > I Desc / is our grammar for 
describing these functors. Hence, I Desc and its interpretation have 
the following types: 

IDesc (7: Set) : Set 

I_] : (JS bt)-> IDesc /->(/->• Set) -> Set 

Given these components, we may interpret a function R : 
I — > IDesc/ is interpreted as a function I — > SET 1 — > SET, which is 
isomorphic to SET 7 — > SET 7 , the type of endofunctors on SET 7 . In- 
ductive families are fixpoints defined over these indexed functors, 
hence computing a fixpoint of the entire family of functors: 

r I- I: SET ri- R:I^\DescI 
T h /xji?:/->SET 

rh/:SET rhi?:7-> IDesc/ 
r\-i:I T\- x-jRij! (fuR) 

r h con x:fiiRi 

However, we still have to define the actual grammar. We obtain 
it by evolving Desc to cope with indexing. The code of IDesc is 
presented in Figure 6. Induction on indexed descriptions is defined 
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IDesc (J: Set) 
'var (i : /) 
'const (A: Set) 
(D: IDesc/) 'x(D: IDesc/) 



'E (S:Set) (£>:S 
'n (S: Set) (D:S-> 

[_] :(/set)-> IDesc/ 
I'variJ/ X >->• 
['const if]/ h-> 
[D'xZ?']/ JSf h- > 
['ESDI/ 

i'ns/?]/ 



Set 
IDesc/ 
IDesc/ 
IDesc/ 
IDesc/ 
IDesc/ 

Set 



IDesc/) 
IDesc/) 

->■(/->■ Set) 
X i 
K 

ID^XxlD'jjX 
(s:S) x ID a\zX 
{a:S)->\D sjiX 



Figure 6. Universe of indexed descriptions 



by: 
indl 



(/Set) ■ 



y (/?:/-> IDesc I)(P:((i:I) x fxiR i)^Set) 



{{%:I)(xs:lR iji^iR))- 
[AMI (.ft i) XniR) xs\ P-s>P [i,con xs])-* 
(i:I)(x-.fj,iR i) — ¥P [i, x] 
indl RP mi (con xs) = 

m i xs (alll ft i /x/ft P (AAi. Ass. indl /? P m) xs) 

Where the operators Alll and alll are presented in Figure 5. As for 
descriptions, we can compute a generic catamorphism, catal, from 
indl. 

5.2 Examples 

Natural numbers: In order to gain some intuition of I Desc, let us 
re-implement the pattern functor of natural numbers: 

NatD : IDesc 1 

NatD \-> 'E (#['zero 'sue]) ['const 1 'var []] 

Because Nat is just an inductive type, NatD is a 1-indexed 
functor. Therefore, the recursive argument is materialised by 'var[], 
where we were using 'indx in the previous presentation. This 
transformation generalises to all inductive types. Moreover, we 
gain the ability to write mutually recursive inductive types. 

Indexed descriptions: Note that I Desc / itself is merely an induc- 
tive type. Hence, we can describe it in IDesc 1: 



IDescD : (/: Set) ->• IDesc 1 
IDescD / >-> 'E (#['var 'const 
'const / 



'E 'IT]) 



'const Set 
'var [] 'x 'var [] 

'E Set (AS. 'IT 5" (A_. 'var [])) 
'E Set (AS. 'II S (A_. 'var [])) 

Therefore, this universe is self-describing, hence can be levi- 
tated. As before, we rely on a special purpose switch ID operator to 
build the finite function [. . .] without mentioning IDesc. 

Vectors: So far, the examples we have seen live in IDesc 1, hence 
are not using any indexing. We remedy this by encoding the vec- 
tors. Recall that the constructors 'vnil and 'vcons are only defined 
for an index 'zero and 'sue respectively.: 

data Vec (X : Set) : (n : Nat) -> SET where 
'vnil : Vec X 'zero) 

'vcons : („j\i a t) _>X — > Vec X n — > Vec X ('sue n) 

One way to code constrained datatypes is to appeal to equality. 
The constraints are therefore captured by equations in the datatype. 



In this case, we obtain the following definition: 

VecD : SET -> Nat -> IDesc Nat 
VecD X n = 'E(#['vnil 'vcons]) 

'const (n == 'zero) 
'E Nat Am. 'var m 'x 

'const (n == 'sue m) 

In the 'vnil case, a proof must be provided that the index is 
equal to 'zero. In the 'vcons case, we first store an element m 
of Nat. However, the constraint stipulates that m cannot be any 
natural numbers: it must be "the index minus one". This translates 
into the constraint n == 'sue m, given a suitable presentation of 
propositional equality. 

We have been careful to keep our setup agnostic with respect to 
notions of propositional equality. Any will do, according to your 
convictions, or for vectors, none — equality for Nat is definable 
by recursion — and many variations are popular. The traditional 
homogeneous identity type used in Coq is not adequate to support 
dependent pattern matching, but its heterogeneous variant, allowing 
equations between elements of arbitrary types, is sufficient to allow 
the translation of structurally recursive pattern matching programs 
to indl [Goguen et al. 2006]. Our present inclination is towards the 
extensional equality proposed by Altenkirch et al. [2007], which 
also sustains the translation. 

However, sometimes, we can actually remove these equations 
altogether. Let us look back at Vec. We note that the equations are 
introduced because we are storing the index of the inductive family. 
However, inductive families need not store their indices [Brady 
et al. 2003]. By examining the incoming index, we can apply the 
forcing and de-tagging optimisations to our initial definition of Vec. 
This gives the following, equivalent definition: 



VecD (X : Set) : 
VecD X 'zero 
VecD X ('sue n) 



Nat- 



IDesc Nat 
'const 1 
'const X 'x 



The equations (and constructors) have simply disappeared. A sim- 
ilar example is Fin, specified by: 



: Nat) — > SET where 

>Fin ('sue n) 

>Fin n — > Fin ('sue n) 



data Fin : (rt : 

'Fz : („N at ) - 
'Fs : („N at ) - 

In this case, we can apply forcing, but not detagging, since both ' Fz 
and 'Fs both target 'sue: 

FinD : Nat ->• IDesc Nat 
FinD'zero H> 'E (#[]) [] 
FinD ('sue n) h-> 'E(#['Fz 'Fs]) 
'const 1 



We should precise that forcing a description is not guaranteed 
to remove all constraints. It is subject to future work to see if con- 
straints can be entirely eradicated, or presented more conveniently 
to the developer. Finally, it is worth mentioning that these optimi- 
sations are source-to-source transformations on descriptions. 

Tagged indexed descriptions: When defining an indexed datatype, 
we have access to its index. Therefore, we can use this index to in- 
fluence the choice of constructors. This captures the essence of 
dependent datatypes: a term - the index - has the ability to influ- 
ence the datatype. We define tagged indexed descriptions to capture 
this specificity. 

We divide a tagged indexed description in two parts: first, the 
constructors that do not depend on the index; then, the constructors 
that do. The non-dependent part mirrors the definition for non- 
indexed descriptions. The index-depend part simply indexes the 
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AIM: { / S et) ^(P: IDesc 7)(Jf:7^ Set) ^ alll 

[P]/ X — > IDesc ((i:I) x X i) 

Alll ('var i) X x — 'var[i,x] alll 

Alll ('const K) X k — 'const 1 alll 

AIII(P'xP') X[d,d']= MUD X d'xA\\\D' X d' alll 

Alll ('E S D) X [s,d] = A\\\(Ds)Xd alll 

Alll ('II S 75) Xf = 'US (As. Alll (D s) X (f s)) alll 



: ( /SET)^(j9:IDesc/)(X:7-^SET)(P:((i:7) x X i)->Set)^> 

((x:(j:7) x X t)-» P z) -> (zs: [£>]/ X)->[AIII7J X xsj P 
('var i) X P p x = p [i, x] 
('const K)X Ppk = [] 

(D'xD') XPp[d,d']= [a\\\D X P p d,a\\\D' X P p d'] 
('E S D) X Pp [a, d] = a\\\(D s) X P p d 
('USD) XPpf = alll (Z? a) X P p (/ a) 



Figure 5. Indexed induction predicates 



choice of constructors by I. Hence, by inspecting the index, it is 
possible to enable or disable constructors. 

TaglDesc 7 h-> AlwaysD 7 x IndexedD 7 

AlwaysD 7 ^ (E : En) x ((i :7)-> tt P (A_. IDesc 7)) 

IndexedD 7 1-> (P : 7 -> En) x ((i : 7) -> tt (P i) (A_. IDesc 7)) 

In the case of a tagged Vec, for instance, for the index 'zero, we 
would only propose the constructor 'nil. Similarly, for 'sue n, we 
would only propose the constructor 'cons. 

We use the notation T7P to denote the indexed description 
computed from the tagged indexed description TID. Its expansion 
is similar to the definition of de but more involved. 

Typed expressions: We are going to define a syntax for a small 
typed language. We consider two types, natural numbers and 
booleans: 

Ty h-> #['nat 'bool] 

An expression of this language is either a value, a conditional 
expression, an addition of numbers, or a comparison of numbers. 
Informally, their type is the following: 



'cond 
'plus 
'le 
'val 



Vty : Ty.'bool - 
'nat — > 'nat — > 
'nat — > 'nat — > 
VfyiTy.Val ty 



>ty- 

nat 

bool 



ty^ty 



The function Val, used in the definition of 'val, simply maps a 
type ty of the object language to the corresponding type in the 
host language. Hence, the arguments of 'val are ensured to be of 
the expected type. We assume that Nat and Bool represent natural 
numbers and booleans in the host language, equipped with an 
addition operation plusHost and a comparison function leHost. We 
define Val as follows: 

Val : Ty^SET 
Val 'nat = Nat 
Val 'bool = Bool 

In our universe of descriptions, the syntax of this language 
is described by a tagged indexed description. We use the index 
to carry the type: the resulting description is indexed by Ty. We 
observe that some constructors are always defined, namely 'cond 
and 'val. On the other hand, the 'plus and 'le constructors are index- 
dependent, 'plus is defined if and only if the result type - the index 
-is 'nat, whereas 'le is defined if and only if the index is 'bool. The 
actual code precisely follows this intuition, as shown in Figure 7. 

Having implemented the syntax, we would like to describe its 
semantics. To do so, we implement an evaluator. The type of the 
evaluator is: 



eval^ : (ty : Ty) — > /ijyExprD ty — ¥ Val ty 

The type of evaljj is strikingly similar to a catamorphism. Indeed, 
implementing a single step of evaluation - the algebra - is suffi- 
cient, as catal gives, for free, the full evaluator. The implementa- 



ExprD : TaglDesc Ty 
ExprD M> (ExprAD, ExprlD) 

ExprAD : AlwaysD Ty 

['val 'cond ], 



ExprAD 



My. 



'const (Val ty) 

'var 'bool 'x 'var ty 'x 'var ty 



ExprlD : IndexedD Ty 

[['plus] [Me]], 
A_. 'var 'nat 'x 'var 'nat 



ExprlD i 



Figure 7. Syntax of typed expressions 



tion is as follows: 



eval^ 
evai; 
eval 4 _ 
eval; _ 
eval^ 'nat 
evalj. 'bool 



(ty :Ty) -> [ExprD tyj Ty Val-)- Val ty 
'val, x] = x 

'cond, ['true, \x, _]] ] = x 
'cond, ['false, [_, y}]] = y 
'plus, [x, y]] = plusHost xy 

'\e,[x,y]\ = leHost x y 



evaljj. : (ty : Ty) — > /^TyExprD ty — > Val ty 

evaljj. ty term = cataljyExprD Val eval^ ty term 

Hence, we have defined the syntax of a typed language of 
arithmetic and boolean expressions. We have given its semantics 
through an evaluation function. Provided a one step semantic of the 
language, the big step interpreter is granted without effort thanks to 
the generic catamorphism. 

However, so far, we are only able to define and manipulate 
closed terms. By abstracting over Val, it is possible to build and 
manipulate open terms, that is, terms with variables. Following Val, 
we define Var by: 

Var : En — > Ty — > SET 

Var t j om _ = #dom 

Whereas Val was mapping the type to the corresponding host type, 
Var maps types to a finite set. The finite set - the context - contains 
closed terms. A variable is therefore a 'val that contains a pointer 
to a particular element of the finite set - an element of #dom. 

Consequently, replacing Val ty by (Val ty+\/ardomty) in Fig- 
ure 7 turns the language of closed terms into a language of opened 
terms with variables and constants. For readability, we abbreviate 
Aty.Val ty+Vardomty by Val+Var,j om . This defines a new in- 
dexed description, called ExprDvar.dom- 

Again, we would like to give a semantics to this extended 
language. We proceed in two steps: first, we replace the variables 
by their value in the context; then, we evaluate the resulting closed 
term. Thanks to evaljj., the second problem is already solved. Let 
us focus on discharging variables from the context. Again, we can 
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subdivide this problem: first, discharging a single variable from the 
context; then, applying this discharge function on every variables 
in the term. 

The discharge function is relative to the required type and a 
context of the right type. Its action is to map values to themself, 
and variables to their value in context. This corresponds to the 
following function: 

discharge: (ty : Ty) (dom : En) 



(-y:n dom (A_. ^ T yExprD V ar ^ om %/))->■ 
(Val ty+\lar dom ty) /LtTyExprDvar, dom ty 

discharge ty dom 7 (left x) h-> con ['val, x] 

discharge ty dom 7 (right v) 1— > 

switch dom (A_. /ijyExprDvar, dom 

ty) 7 v 

We are now left with applying discharge over all variables of 
the term. The type of this operation is the following: 

substExpr : (dom: En) 

(7nat:r ty dom 'nat)(7booi :T t y dom 'bool) 
(a: (dom: En) 

(7nat :T t y dom 'nat) 

(7booi :Tty dom 'bool) 

(ty :Ty)- >(Val ty+Var dom ty) -> 

^iTyExprD ty) — > 



(fy : Ty) - > ^ T yExprD Var ,d 0 m fy- 
/ijyExprD £?/ 

Where r ty corresponds to a context, defined by: 

Fty : En — > Ty — > SET 



~>m ty — ty dom (A_. /ijyExprDv 



ty) 



Abstracting away the book-keeping introduced by contexts, this 
definition looks familiar. It is similar to a monadic bind. This is not 
surprising as we are defining a first-order syntax with variables: 
our datatype enjoys more structure than what we are given. We are 
facing a free monad, where 'val is the return introducing variables. 
For convenience, we wrap discharge in a a function that picks the 
context of the right type: 

o dom 7nat 7booi ty var 1— > discharge ty dom ^yt y var 

Where 7*„ is short for case ty of < , ? 3t 7^ ^ nat 
' * \ bool^7 boo | 

Instead of implementing substExpr in this special case, we are 
now going to implement the free indexed-monad construction. 

5.3 Free indexed monad 

In Section 4.4, we have built a free monad operation for simple 
descriptions. The process is similar in the indexed world. Namely, 
given an indexed functor, we derive the indexed functor coding its 
free monad: 

-* : (/SET)-^(-R:TaglDesc/)(X:/-H»SET)— >TaglDesc/ 
[E, F]* R h-> 

[['cons 'var (tto E), \i. ['const (R i), (-k\ E) i]] , F] 

Just as in the universe of descriptions, this construction comes 
with an obvious return and a substitution operation, the bind. Its 
definition is the following: 

substl : (/SET ) ^(i?:TaglDesc /)(^, F:/^Set)-> 

\(i:I)^X i ^ni (RjY) 

(i :/)(£>: m (R*! X) i)^ni{R* Y)i 
substl X Y Ra it = 



Where applyl is defined as follow: 

applyl : (/ S ET)->(fl:TaglDe sc I)(X , Y:/^Set) 
((i-.I)^-X i^-miR*! Y) «)-> 



(i:I)^l(R* I X)ij If j, I (R* I Y) -^miRl Y) 1 
applyl R X Y a i ['var, x] 1— > a 1 x 
applyl R X Y a i [c, ys] h-> con [c, ys] 

Let us now consider two examples of free indexed monad. 

Typed expressions: In Section 5.2, we had the intuition that our 
datatypes ExprD and ExprDv ar ,<iom enjoy a monadic structure. We 
had identified the variable substitution operation as the bind of 
a free monad. To exhibit its monadic structure, we first have to 
massage the definition of our datatype. 

As previously mentioned, we identify 'val with the return of 
the free monad, while the other components are the action of the 
monad. As a result, the definition is similar to ExprD presented in 
Figure 7, replacing ExprAD by ExprAD Free : 



ExprAD 
ExprAD F 



AlwaysD Ty 
#['cond], 

My. [ 'var 'bool 'x 'var ty 'x 'var ty 



We call this datatype ExprD Free . By a simple unfolding of def- 
inition, we note that ExprD Free Ty Val corresponds to the syntax of 
closed terms, ExprD. Similarly, ExprD Free Ty (Va\+Vard om ) corre- 
sponds to expressions with variables, ExprDvar, dom- 

The evaluator for closed terms we implemented in Section 5.2 
remains unchanged. It reduces closed terms in ExprD Free Ty Val ty 
to values in Val ty. We are left with implementing substExpr. We 
simply have to fill in the right arguments to substl, the type guiding 
us: 

substExpr dom 7„at 7booi o~ ty term 1— >■ 
substljy ExprD Free (Val+Var dom ) Val 
(a dom 7n a t 7booi) ty term 

Hence completing our implementation of the open terms inter- 
preter. 

We have defined a well-typed language of arithmetical expres- 
sions, taking advantage of indexing. Then, we have implemented an 
evaluator for closed term, based on the generic catamorphism func- 
tion. Using the free monad construction, we have automatically de- 
rived the language of open terms. Using its monadic structure, we 
have implemented the interpreter for open terms in context. Hence, 
without much efforts, we have described the syntax of a well-typed 
language, together with its semantics. 

Indexed descriptions: Another instance of free monad is IDesc 
itself. Indeed, 'var is nothing but the return. The remaining con- 
structors are the carrier functor, trivially indexed by 1. The carrier 
functor is described as follow: 

IDescD Free : AlwaysD 1 
['const 



IDescD Free 1 



x '£ 'n], 
'const Set 
'var [] 'x 'var [] 

'ESet (AS. 'n S (A_. 'var [])) 
. 'ESET(A5.'n5*(A_.'var[])) . 



catal R* X (fj,R* Y ) (applyl R X Y a) it 



Then, we get IDesc by building its free monad: 

IDescD : (/ : Set) -> TaglDesc 1 

IDescD / i-> [IDescD Free , [A_. [],A_. []]]* A_. / 

The fact that indexed descriptions are closed under substitution 
is potentially of considerable utility, if we can exploit this fact: 

\oD\j X = \D\i Ai. \ai\j X where a : 1^ IDesc J 
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By observing that a description can be decomposed via substitu- 
tion, we split its meaning into a superstructure of substructures, 
e.g. a 'database containing salaries', ready for traversal operations 
preserving the former and targeting the latter. 

In this section, we have presented the universe of indexed de- 
scription. It embraces indexed families of types and, as such, allows 
us to write dependent datatypes. Hence, we have presented several 
example of indexed datatypes. In this context, we have presented 
the free monad construction, together with its monadic operations. 

6. Discussion 

6.1 Universe stratification 

As such, our type theory suffers from an inconsistency. Indeed, 
the typing rule SET : SET leads to Girard's paradox. We made 
that choice for presentational convenience, as universe stratifi- 
cation is orthogonal to our work. Nonetheless, our universe of 
description stratifies naturally. IDesc is self-encoding only in a 
level-polymorphic sense. Unsurprisingly, I Desc at level I is of type 
Set' +1 . Similarly, the interpretation of IDesc at level I is an object 
of type Set': 

IDesc'(/:SET i+1 ) 
'var (i : I) 
'const (A: Set') 
(Z> : IDesc'/) 'x(Z>: IDesc'/) 
'E (S: Set') (D : S -> IDesc'/) 
'IT (S: Set') (D : S -> IDesc'/) 



Set 



i+i 



IDesc'/ 
IDesc'/ 



•(JSet' + 1 ) - 



IDesc' /->(/- 



IDesc'/ 
IDesc'/ 
IDesc'/ 

Set')^ 



Set' 



Crucially, the types of data stored in an IDesc' / all live no higher — 
we may store an / and a Set' in a Set' 4 " 1 . The code for IDesc' / is 
an element of I Desc' +1 1, so there is a spiral, not a cycle. We have 
checked the construction using Agda's universe polymorphism, 
coding IDesc in itself and have proving the isomorphism between 
the host and the embedded universes. 

6.2 Related work 

Generic programming is a vast topic. We refer our reader to Garcia 
et al. [2003] for a broad overview of generic programming in 
various languages. In the sole context of Haskell, there is a myriad 
of proposals. These approaches are compared in Hinze et al. [2007] 
and Rodriguez et al. [2008]. 

Our approach is follow the polytypic programming style, as 
initiated by PolyP [Jansson and Jeuring 1997]. Indeed, we build 
generic functions by induction on pattern functors. Unlike PolyP, 
we do not have to resort to preprocessing: our datatypes are, na- 
tively, nothing but codes. 

We share with Generic Haskell the type-indexed datatype ap- 
proach [Hinze et al. 2002], as exemplified by the free monad con- 
struction: from datatype, we can compute new datatypes and equip 
them with their structure. Generic Haskell also features generic 
views [Holdermans et al. 2006], transparently transforming the 
structure of datatype definitions. An example is the tagged descrip- 
tions, presenting datatypes under a sum-of-sigmas angle. Unlike 
Generic Haskell, we do not have to modify the compiler to obtain 
views on datatypes: we can massage descriptions from inside our 
language. 

Unlike Generic Haskell, we do not support polykinded pro- 
gramming [Hinze 2000]. Our descriptions are limited to endo- 
functors on Set and SET 1 . While we could encode higher-kinded 
datatypes, we do not plan to adopt this strategy. As future work, 
we plan to extend our universe to capture higher-kinded definitions 
and generic functions over them. For the same reason, arity-generic 



programming [Weirich and Casinghino 2010] is out of reach of our 
current presentation. 

Another generic programming paradigm is Scrap Your Boiler- 
plate [Lammel and Peyton Jones 2003] (SYB). Our proposal is dif- 
ferent in various ways. The corner stone of SYB is the spine view 
of datatype constructors. A piece of data is a spine composed by 
a constructor applied to some arguments. SYB provides a com- 
binator library to write generic functions over spines. This relies 
on a Typeable type-class, allowing dynamic dispatch to datatype- 
specific operations. As a result, SYB is not reflexive: it is re- 
stricted to datatypes instanciating Typeable. Moreover, it is limited 
to building generic functions, hence type-indexed datatypes cannot 
be implemented in this framework. 

Generic programming in dependent types is not new either. 
Norell [2002] has given a formalization of polytypic programming 
in Alfa, a precursor of Agda. Similarly, Verbruggen et al. [2008, 
2009] have developed a framework for polytypic programming in 
the Coq theorem prover. However, these works aim at modelling 
PolyP or Generic Haskell in a dependently-typed setting for the 
purpose of proving correctness properties of Haskell code. Our 
approach is different in that we aim at building a foundation for 
datatypes, in a dependently-typed system, for a dependently-typed 
system. 

Closer to us is the work of Benke et al. [2003]. This seminal 
work introduced the usage of universes for developing generic 
programs. Our universes share similarities to theirs: our universe 
of descriptions is similar to their universe of iterated induction, and 
our universe of indexed descriptions is equivalent to their universe 
of finitary indexed induction. This is not surprising, as we share the 
same source of inspiration, namely induction-recursion. 

However, we differ in several ways. First, there approach is gen- 
erative: each universe extends the base type theory with both type 
formers and elimination rules. Thanks to levitation, we only rely on 
a generic induction and a specialised switch D. Second, the authors 
do not tackle the issue of programming with codes. We have shown 
how to abstract away codes and give a convenient presentation to 
the developer. The authors often resort to an extensional equality, 
while we have given an equality-agnostic presentation. Beside, our 
universes are arranged so as to use definitional equality as much as 
possible. Hence, in practice, the developer is relieved from many 
proof obligations. 

7. Conclusion 

In this paper, we have presented a universe of datatypes for a de- 
pendent type theory. To ensure the generality of our proposal, this 
system has been built in a familiar type theory, with no assump- 
tion about the underlying propositional equality. Because our ap- 
proach is extensively using codes for universes, we have given a 
rationalised presentation of codes. Thanks to type propagation, we 
make practical the usage of codes for datatypes. 

To introduce our approach, we have presented a universe of de- 
scription. This universe has the expressive power of simple induc- 
tive types, as found in ML-like languages. Further, we have imple- 
mented this universe as a self-described object. Hence, for a min- 
imal extension of the type-theory, we get a closed, self-describing 
presentation of datatypes, where datatypes are just data. 

To capture dependent datatypes, we generalise our presenta- 
tion to support indexing. The universe of indexed descriptions 
thus built encompasses inductive families. Again, this universe is 
self-described. We have developed several examples of dependent 
datatypes and generic functions over them. 

We have presented a self-describing, self-hosted universe for 
datatypes. We have shown the benefit of such approach, by our 
ability to reflect datatypes in the type-theory. This fosters a new 
way of considering generic programming: just as programming. 
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Moreover, despite its egg-and-chicken nature, this presentation is 
free of paradox: it has been formalised in Agda, admitting a correct 
stratification. 

Future work: As such, indexed descriptions do not cover sev- 
eral extensions of inductive families. One of them is induction- 
recursion. An interesting question is to locate indexed descriptions 
in the spectrum between inductive families and indexed induction- 
recursion. Another popular extension we plan to consider is to al- 
low internal fixpoints and higher-kinded datatypes. 

Also, we have presented a generic notion of syntax with vari- 
ables, thanks to the free monad construction. We would like to ex- 
plore a notion of syntax with binding. Interestingly, introducing in- 
ternal fixpoints or kinds would turn our universe into such syntax 
with binding. Once again, levitation would reveal itself convenient 
by providing generic tools to handle binding. 
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